An Optimized Cost-Free Learning Using ABC-SVM Approach in the Class Imbalance Problem
نویسنده
چکیده
In this work, cost-free learning (CFL) formally defined in comparison with cost-sensitive learning (CSL). The primary difference between them is that even in the class imbalance problem, a CFL approach provides optimal classification results without requiring any cost information. In point of fact, several CFL approaches exist in the related studies like sampling and some criteriabased approaches. Yet, to our best knowledge none of the existing CFL and CSL approaches is able to process the abstaining classifications properly when no information is given about errors and rejects. Hence based on information theory, here we propose a novel CFL which seeks to maximize normalized mutual information of the targets and the decision outputs of classifiers. With the help of this strategy, we can manage binary or multi-class classifications with or without refraining. Important features are observed from the new strategy. When the degree of class imbalance is changing, this proposed strategy could able to balance the errors and rejects accordingly and automatically. A wrapper paradigm of proposed ABC-SVM (Artificial Bee Colony-SVM) is oriented on the evaluation measure of imbalanced dataset as objective function with respect to feature subset, misclassification cost and intrinsic parameters of SVM. The main goal of cost free ABC-SVM is to directly improve the performance of classification by simultaneously optimizing the best pair of intrinsic parameters, feature subset and misclassification cost parameters. The obtained experimental results on various standard benchmark datasets and real-world data with different ratios of imbalance show that the proposed method is effective in comparison with commonly used sampling techniques.
منابع مشابه
APPLICATION OF THE HYBRID HARMONY SEARCH WITH SUPPORT VECTOR MACHINE FOR IDENTIFICATION AND CALSSIFICATION OF DAMAGED ZONE AROUND UNDERGROUND SPACES
An excavation damage zone (EDZ) can be defined as a rock zone where the rock properties and conditions have been changed due to the processes related to an excavation. This zone affects the behavior of rock mass surrounding the construction that reduces the stability and safety factor and increase probability of failure of the structure. This paper presents an approach to build a model for the ...
متن کاملPREDICTION OF SLOPE STABILITY STATE FOR CIRCULAR FAILURE: A HYBRID SUPPORT VECTOR MACHINE WITH HARMONY SEARCH ALGORITHM
The slope stability analysis is routinely performed by engineers to estimate the stability of river training works, road embankments, embankment dams, excavations and retaining walls. This paper presents a new approach to build a model for the prediction of slope stability state. The support vector machine (SVM) is a new machine learning method based on statistical learning theory, which can so...
متن کاملOptimizing Cost-Sensitive SVM for Imbalanced Data : Connecting Cluster to Classification
Class imbalance is one of the challenging problems for machine learning in many real-world applications, such as coal and gas burst accident monitoring: the burst premonition data is extreme smaller than the normal data, however, which is the highlight we truly focus on. Cost-sensitive adjustment approach is a typical algorithm-level method resisting the data set imbalance. For SVMs classifier,...
متن کاملBreast Cancer Diagnosis from Perspective of Class Imbalance
Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the ...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کامل